“Reuse” of Biblical Quotes in Swedish 19th Century Fiction
نویسنده
چکیده
Dimitrios Kokkinakis and Mats Malm Introduction Multifaceted relations between texts can be complex, abstract, diverse or subtle. Digital humanists are interested in identifying pairs of text passages likely to contain substantial overlap and empirically supporting (hopefully) new interpretations of historical texts. For instance, Cordell [3] discusses how digital interpretive tools can help make better sense of enlarged bibliographies, and the continuous digging into digital archives promises to effect exciting revisions to our literary history. Intertextual similarities [12] between historical texts embrace a larger set of morphological, linguistic, syntactic, semantic and copying variations, thus adding a complication to text-reuse detection. Recycled text chunks are frequently only small portions of a document and may be significantly modified [4,5]. Older language variants and dialects are less standardized; their evolution spanning centuries [1]; unlike today, e.g. verbatim quotes in older texts were not visually enclosed in quotation marks, making it hard for us to discern reuse from ‘original’ text; some authors quote other authors we know nothing about or whose works do not survive. Moreover, spelling and orthographic variations as well as OCR-errors can be problematic for the identification of (historical) text reuse [1,2]. Therefore, the task of detecting text re-use is challenging with NLP having a major role to play in this process. NLP techniques to discover intertextual similarities between historical texts is a major topic of considerable interest among scholars from both a theoretical and practical point of view. Biological sequence alignment is one available method also used for detection of similar passages in text collections [2,5,7,9,10]. We use the Pairwise Alignment for Intertextual Relations (PAIR[11]), a simple implementation of sequence alignment for text analysis which supports one-against-many comparisons.
منابع مشابه
Character Profiling in 19th Century Fiction
This paper describes the way in which personal relationships between main characters in 19 century Swedish prose fiction can be identified using information guided by named entities, provided by a entity recognition system adapted to the 19 century Swedish language characteristics. Interpersonal relation extraction is based on the context between two relevant, identified person entities. The re...
متن کاملGender-Based Vocation Identification in Swedish 19th Century Prose Fiction using Linguistic Patterns, NER and CRF Learning
This paper investigates how literature could be used as a means to expand our understanding of history. By applying macroanalytic techniques we are aiming to investigate how women enter literature and particularly which functions do they assume, their working patterns and if we can spot differences in how often male and female characters are mentioned with various types of occupational titles (...
متن کاملSemantic search in literature as an e-Humanities research tool: CONPLISIT - Consumption patterns and life-style in 19th century Swedish literature
We present our ongoing work on language technology-based e-science in the humanities, with a focus on text-based research in the historical sciences. Currently, we are working on the adaptation and integration of lexical resources representing different historical stages of Swedish into a lexical and morphological toolbox that will allow us to develop semantically oriented text search applicati...
متن کاملNovel2Vec: Characterising 19th Century Fiction via Word Embeddings
Recently, considerable attention has been paid to word embedding algorithms inspired by neural network models. Given a large textual corpus, these algorithms attempt to derive a set of vectors which represent the corpus vocabulary in a new embedded space. This representation can provide a useful means of measuring the underlying similarity between words. Here we investigate this property in the...
متن کاملNaming the Past: Named Entity and Animacy Recognition in 19th Century Swedish Literature
This paper provides a description and evaluation of a generic named-entity recognition (NER) system for Swedish applied to electronic versions of Swedish literary classics from the 19th century. We discuss the challenges posed by these texts and the necessary adaptations introduced into the NER system in order to achieve accurate results, useful both for metadata generation, but also for the en...
متن کامل